Decoding Science: Repeat After Me

Feature Story

Science Communication

Research and Standards

Research

Last update May 12, 2021

Why scientists repeat studies — and why that’s harder than you might think

You’re rocking your favorite video game and just made it to Level 8. Next time you log in, you find that the system failed to save your progress, and you’ve got to start over. If you follow the same exact steps again, will you get back to Level 8?

Reproducibility — getting consistent results using the same data and code — and replicability — getting consistent results following the same methods but using new data — are hallmarks of good science. Reproducing studies helps researchers share their data and code while replicability verifies that the methods are sound and the results hold up. As we gain confidence in our methods and findings, we create a foundation from which to experiment and build more knowledge.

It seems simple: Use the same data and code, and you should get the same results. But as it turns out, reproducibility can be rather tricky.

With the complex computational methods scientists use today, it can be hard to perform a study exactly the same way even if you have the data and code in hand. Miss a step or sub in a different version of the software, and you may not get the same results.

By taking steps to support reproducibility and overcome common pitfalls, scientists are working to make science stronger.

A 20-year search for 12,000 years of data

“It’s really good to do public archiving [...] as close to the conclusion of the study or publication of the results as possible, because you actually never know whether a particular piece of work will turn out to be an important scientific stepping stone or not,” says Michael Evans, a climate scientist at the University of Maryland, College Park.

Evans would know. In 2001, he was part of a team of scientists that published an analysis of the sun’s influence on Earth’s climate over the past 12,000 years. Two decades later, he’s still fielding requests for the data and code used in the study. The work, as it turns out, was one of those important scientific stepping stones. Many have sought to replicate it — to perform similar studies and see if the scientific conclusions are similar — in the years since.

Gerard Bond, a celebrated geologist, was the study’s lead author. In 2005, Bond passed away. Soon, Evans started getting requests for the data and code behind the study, aspects of which only Bond had full knowledge. Bond’s wife and collaborator Rusty Lotti-Bond kindly shipped Evans a copy of the scientist’s computer hard drive, but Evans didn’t have the same software Bond had used to process the data, and there were gaps where certain modifications didn’t leave a digital record.

Chipping away at the project off and on for more than a year, Evans reverse-engineered the time series plotted in key figures in the paper as well as he could, and made it all publicly available in 2008.

But even that wasn’t enough. Bond had filtered some of the data, a step Evans had retained when archiving the work. In 2020, Evans received a request for the raw data, setting off another search. At one point, he found himself reaching out to colleagues of a second scientist, Benjamin Flower, who had started working with Bond’s data years earlier and then passed away before completing any studies with it.

Flower’s colleagues were able to dig up some of the data, bringing the legacies of two deceased scientists together to help advance tomorrow’s climate science.

“When people started asking me about the data, I realized [archiving it] is something that we ought to do, and I considered it as a way of showing my respect for Gerard, a great scientist,” said Evans. “I don’t think I expected to be, 20 years later, still trying to track down bits of it.”

Spaghetti code and shifting sands

Evans’ quest is all too familiar to Lorena Barba, a computational scientist at George Washington University. As a graduate student, she spent a full year piecing together computer code created by a previous student. “It was hell,” she reflected in an article in Science. “Now that I run my own lab, I make sure that my students don’t have to go through that.”

If two researchers look at a similar problem and come to different conclusions, pinpointing the source of the disagreement requires understanding how each scientist arrived at their results, Barba says. But full transparency on research methods is not as easy as it sounds.

“It’s not like in the times of Marie Curie, where you’re writing notes in your notebook,” said Barba. “Every single result has passed through complex layers of data analysis and model building and software. Unless the research group has very good practices of writing code so that it can be reused and documenting the software as it’s being written, you end up having to untangle spaghetti code.”

Computational methods are so complex now that it’s not even a given that running the same inputs through the same code on the same machines will always result in the same answer. This represents a fundamentally different way of doing — and reproducing — science today than even 10-20 years ago, a shift Barba expects to continue as more researchers tap into high-performance computing, machine learning, and artificial intelligence.

But even as the challenge grows ever more complex, scientists are beginning to make headway with concerted efforts to document and openly share data and code. “Things are changing a bit now, but we have to recognize that it takes more work and more time to do research reproducibly,” said Barba.

Correcting the record

A great deal of scientific knowledge comes in tiny increments, from taking what’s been done before and bringing it one step further. Having a reference case — a study or measurement that has stood up to repeated tests — can be essential to moving knowledge forward. The more the science has been tested, the more confident we can be in the conclusions.

What if a study doesn’t hold up? Evans has firsthand experience with that, too. Once, he recalls receiving an email from a colleague pointing out an error in his team’s code, which the team had made available in a public archive.

“After a minute of feeling mortified, you want to send them roses and say thank you for correcting the record,” he said. “It makes it possible for others to build on something that is a little better, or at least less wrong, than it was before.”

Related Resource

Decoding Science: How does science know what it knows?

Related Publications

Featured Publication

Consensus

2019

One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a la...

View details